智能论文笔记

NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction

Ruyi Zha , Yanhao Zhang , Hongdong Li

分类：计算机视觉

2022-09-29

本文提出了一种新颖而快速的自我监督解决方案，用于稀疏视图CBCT重建（锥束计算机断层扫描），不需要外部训练数据。具体而言，所需的衰减系数表示为3D空间坐标的连续函数，该功能由完全连接的深神经网络参数化。我们可以离散地综合预测并通过最大程度地减少真实和合成预测之间的误差来培训网络。采用基于学习的编码器需要哈希编码来帮助网络捕获高频细节。该编码器在具有更高的性能和效率方面优于常用的频域编码器，因为它利用了人体器官的平稳性和稀疏性。已经在人体器官和幻影数据集上进行了实验。所提出的方法可实现最先进的准确性，并花费相当短的计算时间。

translated by 谷歌翻译

Estimating Latent Population Flows from Aggregated Data via Inversing Multi-Marginal Optimal Transport

Sikun Yang , Hongyuan Zha

分类：机器学习

2022-12-30

We study the problem of estimating latent population flows from aggregated count data. This problem arises when individual trajectories are not available due to privacy issues or measurement fidelity. Instead, the aggregated observations are measured over discrete-time points, for estimating the population flows among states. Most related studies tackle the problems by learning the transition parameters of a time-homogeneous Markov process. Nonetheless, most real-world population flows can be influenced by various uncertainties such as traffic jam and weather conditions. Thus, in many cases, a time-homogeneous Markov model is a poor approximation of the much more complex population flows. To circumvent this difficulty, we resort to a multi-marginal optimal transport (MOT) formulation that can naturally represent aggregated observations with constrained marginals, and encode time-dependent transition matrices by the cost functions. In particular, we propose to estimate the transition flows from aggregated data by learning the cost functions of the MOT framework, which enables us to capture time-varying dynamic patterns. The experiments demonstrate the improved accuracy of the proposed algorithms than the related methods in estimating several real-world transition flows.

translated by 谷歌翻译

Bring Your Own View: Graph Neural Networks for Link Prediction with Personalized Subgraph Selection

Qiaoyu Tan , Xin Zhang , Ninghao Liu , Daochen Zha , Li Li , Rui Chen , Soo-Hyun Choi , Xia Hu

分类：机器学习

2022-12-23

Graph neural networks (GNNs) have received remarkable success in link prediction (GNNLP) tasks. Existing efforts first predefine the subgraph for the whole dataset and then apply GNNs to encode edge representations by leveraging the neighborhood structure induced by the fixed subgraph. The prominence of GNNLP methods significantly relies on the adhoc subgraph. Since node connectivity in real-world graphs is complex, one shared subgraph is limited for all edges. Thus, the choices of subgraphs should be personalized to different edges. However, performing personalized subgraph selection is nontrivial since the potential selection space grows exponentially to the scale of edges. Besides, the inference edges are not available during training in link prediction scenarios, so the selection process needs to be inductive. To bridge the gap, we introduce a Personalized Subgraph Selector (PS2) as a plug-and-play framework to automatically, personally, and inductively identify optimal subgraphs for different edges when performing GNNLP. PS2 is instantiated as a bi-level optimization problem that can be efficiently solved differently. Coupling GNNLP models with PS2, we suggest a brand-new angle towards GNNLP training: by first identifying the optimal subgraphs for edges; and then focusing on training the inference model by using the sampled subgraphs. Comprehensive experiments endorse the effectiveness of our proposed method across various GNNLP backbones (GCN, GraphSage, NGCF, LightGCN, and SEAL) and diverse benchmarks (Planetoid, OGB, and Recommendation datasets). Our code is publicly available at \url{https://github.com/qiaoyu-tan/PS2}

translated by 谷歌翻译

Learning to Dub Movies via Hierarchical Prosody Models

Gaoxiang Cong , Liang Li , Yuankai Qi , Zhengjun Zha , Qi Wu , Wenyu Wang , Bin Jiang , Ming-Hsuan Yang , Qingming Huang

分类：自然语言处理

2022-12-08

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference. V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modelling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. Specifically, we align lip movement to the speech duration, and convey facial expression to speech energy and pitch via attention mechanism based on valence and arousal representations inspired by recent psychology findings. Moreover, we design an emotion booster to capture the atmosphere from global video scenes. All these embeddings together are used to generate mel-spectrogram and then convert to speech waves via existing vocoder. Extensive experimental results on the Chem and V2C benchmark datasets demonstrate the favorable performance of the proposed method. The source code and trained models will be released to the public.

translated by 谷歌翻译

Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning

Zirui Wang , Shaoming Duan , Chengyue Wu , Wenhao Lin , Xinyu Zha , Peiyi Han , Chuanyi Liu

分类：机器学习

2022-12-02

Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.

translated by 谷歌翻译

Efficient stereo matching on embedded GPUs with zero-means cross correlation

Qiong Chang , Aolong Zha , Weimin Wang , Xin Liu , Masaki Onishi , Lei Lei , Meng Joo Er , Tsutomu Maruyama

分类：计算机视觉

2022-12-01

Mobile stereo-matching systems have become an important part of many applications, such as automated-driving vehicles and autonomous robots. Accurate stereo-matching methods usually lead to high computational complexity; however, mobile platforms have only limited hardware resources to keep their power consumption low; this makes it difficult to maintain both an acceptable processing speed and accuracy on mobile platforms. To resolve this trade-off, we herein propose a novel acceleration approach for the well-known zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU. In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels; this reduces the amount of data transmission and increases the utilization of on-chip registers, thus increasing the processing speed. As a result, our method is 2X faster than the traditional image scanning method, and 26% faster than the latest NCC method. By combining this technique with the domain transformation (DT) algorithm, our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128. Additionally, the evaluation results on the KITTI 2015 benchmark show that our combined system is more accurate than the same algorithm combined with census by 7.26%, while maintaining almost the same processing speed.

translated by 谷歌翻译

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

Soumajyoti Sarkar , Kaixiang Lin , Sailik Sengupta , Leonard Lausen , Sheng Zha , Saab Mansour

分类：自然语言处理 | 机器学习

2022-11-08

The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.

translated by 谷歌翻译

Solving Math Word Problem via Cooperative Reasoning induced Language Models

Xinyu Zhu , Junjie Wang , Lin Zhang , Yuxiang Zhang , Ruyi Gan , Jiaxing Zhang , Yujiu Yang

分类：自然语言处理

2022-10-28

Large-scale pre-trained language models (PLMs) bring new opportunities to challenge problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.8% increase over best baselines.

translated by 谷歌翻译

Differentially Private Estimation of Hawkes Process

Simiao Zuo , Tianyi Liu , Tuo Zhao , Hongyuan Zha

分类：机器学习 | (统计)机器学习

2022-09-15

点过程模型在现实世界应用中非常重要。在某些关键应用程序中，对点过程模型的估计涉及来自用户的大量敏感个人数据。隐私问题自然出现了现有文献中未解决的问题。为了弥合这一明显的差距，我们提出了第一个针对点过程模型的第一个一般差异私人估计程序。具体来说，我们以霍克斯的流程为例，并根据霍克斯流程的离散表示，为事件流数据引入了严格的差异隐私定义。然后，我们提出了两种差异性优化算法，可以有效地估算霍克斯流程模型，并在两个不同的设置下具有所需的隐私和公用事业保证。提供实验以支持我们的理论分析。

translated by 谷歌翻译

Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Junjie Wang , Yuxiang Zhang , Lin Zhang , Ping Yang , Xinyu Gao , Ziwei Wu , Xiaoqun Dong , Junqing He , Jianheng Zhuo , Qi Yang

分类：自然语言处理

2022-09-07

如今，基础模型已成为人工智能中的基本基础设施之一，铺平了通往通用情报的方式。但是，现实提出了两个紧急挑战：现有的基础模型由英语社区主导；用户通常会获得有限的资源，因此不能总是使用基础模型。为了支持中文社区的发展，我们介绍了一个名为Fengshenbang的开源项目，该项目由认知计算与自然语言研究中心（CCNL）领导。我们的项目具有全面的功能，包括大型预培训模型，用户友好的API，基准，数据集等。我们将所有这些都包装在三个子项目中：风水次模型，风水框架和狂热基准。 Fengshenbang的开源路线图旨在重新评估中国预培训的大型大型模型的开源社区，促使整个中国大型模型社区的发展。我们还希望构建一个以用户为中心的开源生态系统，以允许个人访问所需的模型以匹配其计算资源。此外，我们邀请公司，大学和研究机构与我们合作建立大型开源模型的生态系统。我们希望这个项目将成为中国认知情报的基础。

translated by 谷歌翻译